Formula One
Robust Agents in Open-Ended Worlds
The growing prevalence of artificial intelligence (AI) in various applications underscores the need for agents that can successfully navigate and adapt to an ever-changing, open-ended world. A key challenge is ensuring these AI agents are robust, excelling not only in familiar settings observed during training but also effectively generalising to previously unseen and varied scenarios. In this thesis, we harness methodologies from open-endedness and multi-agent learning to train and evaluate robust AI agents capable of generalising to novel environments, out-of-distribution inputs, and interactions with other co-player agents. We begin by introducing MiniHack, a sandbox framework for creating diverse environments through procedural content generation. Based on the game of NetHack, MiniHack enables the construction of new tasks for reinforcement learning (RL) agents with a focus on generalisation. We then present Maestro, a novel approach for generating adversarial curricula that progressively enhance the robustness and generality of RL agents in two-player zero-sum games. We further probe robustness in multi-agent domains, utilising quality-diversity methods to systematically identify vulnerabilities in state-of-the-art, pre-trained RL policies within the complex video game football domain, characterised by intertwined cooperative and competitive dynamics. Finally, we extend our exploration of robustness to the domain of LLMs. Here, our focus is on diagnosing and enhancing the robustness of LLMs against adversarial prompts, employing evolutionary search to generate a diverse range of effective inputs that aim to elicit undesirable outputs from an LLM. This work collectively paves the way for future advancements in AI robustness, enabling the development of agents that not only adapt to an ever-evolving world but also thrive in the face of unforeseen challenges and interactions.
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.13)
- North America > United States > California > San Francisco County > San Francisco (0.13)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.13)
- (34 more...)
- Research Report > New Finding (1.00)
- Overview (1.00)
- Instructional Material (1.00)
- Research Report > Promising Solution (0.65)
- Leisure & Entertainment > Sports > Motorsports > Formula One (1.00)
- Leisure & Entertainment > Games > Computer Games (1.00)
- Information Technology > Security & Privacy (1.00)
- (5 more...)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
- (4 more...)
4 billion equations calculated for F1 team during race weekend
Nearly 800 sensors feed data back to an operations center that helps the Oracle Red Bull crew make split-second decisions. Verstappen's F1 car is equipped with close to 800 sensors that constantly feed data to his racing team. Breakthroughs, discoveries, and DIY tips sent every weekday. Formula One is unquestionably fast. The motorsport's multi-million-dollar cars achieve speeds over 210 miles per hour on tracks that bend and twist wildly.
- North America > United States > Nevada > Clark County > Las Vegas (0.05)
- Europe > United Kingdom > England > Buckinghamshire > Milton Keynes (0.05)
- Asia > Middle East > Jordan (0.05)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- Europe > Hungary > Budapest > Budapest (0.04)
- Asia > Singapore > Central Region > Singapore (0.04)
- (2 more...)
- Research Report > Experimental Study (0.92)
- Research Report > New Finding (0.67)
- Information Technology (1.00)
- Media (0.67)
- Leisure & Entertainment > Sports > Motorsports > Formula One (0.45)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- (2 more...)
Pragmatic Theories Enhance Understanding of Implied Meanings in LLMs
Sato, Takuma, Kawano, Seiya, Yoshino, Koichiro
The ability to accurately interpret implied meanings plays a crucial role in human communication and language use, and language models are also expected to possess this capability. This study demonstrates that providing language models with pragmatic theories as prompts is an effective in-context learning approach for tasks to understand implied meanings. Specifically, we propose an approach in which an overview of pragmatic theories, such as Gricean pragmatics and Relevance Theory, is presented as a prompt to the language model, guiding it through a step-by-step reasoning process to derive a final interpretation. Experimental results showed that, compared to the baseline, which prompts intermediate reasoning without presenting pragmatic theories (0-shot Chain-of-Thought), our methods enabled language models to achieve up to 9.6\% higher scores on pragmatic reasoning tasks. Furthermore, we show that even without explaining the details of pragmatic theories, merely mentioning their names in the prompt leads to a certain performance improvement (around 1-3%) in larger models compared to the baseline.
- North America > United States > Florida > Miami-Dade County > Miami (0.04)
- North America > Canada > Ontario > Toronto (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- (4 more...)
- Leisure & Entertainment > Sports > Motorsports > IndyCar (0.40)
- Leisure & Entertainment > Sports > Motorsports > Formula One (0.40)
EnvTrace: Simulation-Based Semantic Evaluation of LLM Code via Execution Trace Alignment -- Demonstrated at Synchrotron Beamlines
van der Vleuten, Noah, Flores, Anthony, Mathur, Shray, Rakitin, Max, Hopkins, Thomas, Yager, Kevin G., Tsai, Esther H. R.
Evaluating large language models (LLMs) for instrument control requires methods that go beyond standard, stateless algorithmic benchmarks, since the behavior of physical systems cannot be fully captured by unit tests alone. Here we introduce EnvTrace, a simulation-based method that evaluates execution traces to assess semantic code equivalence. EnvTrace is demonstrated with a beamline control-logic digital twin to facilitate the evaluation of instrument control code, with the digital twin itself also enabling the pre-execution validation of live experiments. Over 30 LLMs were evaluated using trace alignment to generate a multi-faceted score for functional correctness across key behavioral dimensions, showing that many top-tier models can approach human-level performance in rapid control-code generation. This is a first step toward a broader vision where LLMs and digital twins work symbiotically: LLMs providing intuitive control and agentic orchestration, and digital twins offering safe and high-fidelity environments, paving the way towards autonomous embodied AI.
- North America > United States (0.28)
- South America > Uruguay > Maldonado > Maldonado (0.04)
- Europe > Switzerland > Geneva > Geneva (0.04)
- Europe > Montenegro (0.04)
- Energy (0.93)
- Leisure & Entertainment > Sports > Motorsports > Formula One (0.40)
- North America > United States (0.05)
- Europe > Russia (0.05)
- Europe > Monaco (0.05)
- (17 more...)
- North America > United States > California > Los Angeles County > Long Beach (0.04)
- North America > Puerto Rico > San Juan > San Juan (0.04)
- North America > Canada > British Columbia > Vancouver (0.04)
- (3 more...)
- Education (0.94)
- Leisure & Entertainment > Games (0.93)
- Leisure & Entertainment > Sports > Motorsports > Formula One (0.46)
Safe and Optimal Learning from Preferences via Weighted Temporal Logic with Applications in Robotics and Formula 1
Karagulle, Ruya, Vasile, Cristian-Ioan, Ozay, Necmiye
Abstract--Autonomous systems increasingly rely on human feedback to align their behavior, expressed as pairwise comparisons, rankings, or demonstrations. While existing methods can adapt behaviors, they often fail to guarantee safety in safety-critical domains. We propose a safety-guaranteed, optimal, and efficient approach to solve the learning problem from preferences, rankings, or demonstrations using Weighted Signal T emporal Logic (WSTL). WSTL learning problems, when implemented naively, lead to multi-linear constraints in the weights to be learned. By introducing structural pruning and log-transform procedures, we reduce the problem size and recast the problem as a Mixed-Integer Linear Program while preserving safety guarantees. Experiments on robotic navigation and real-world Formula 1 data demonstrate that the method effectively captures nuanced preferences and models complex task objectives. Autonomous systems are increasingly part of our daily lives, from driverless cars in urban navigation to household robots performing domestic chores. Since these systems operate closely alongside humans, learning from human feedback is a natural way to ensure their behaviors align with human desires.
- North America > United States > Michigan (0.04)
- North America > United States > Massachusetts (0.04)
- Asia > Middle East > Republic of Türkiye > Karaman Province > Karaman (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.54)
- Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.48)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.48)
- (3 more...)
DeepDiver: Adaptive Search Intensity Scaling via Open-Web Reinforcement Learning
Shi, Wenxuan, Tan, Haochen, Kuang, Chuqiao, Li, Xiaoguang, Ren, Xiaozhe, Zhang, Chen, Chen, Hanting, Wang, Yasheng, Hou, Lu, Shang, Lifeng
Information seeking demands iterative evidence gathering and reflective reasoning, yet large language models (LLMs) still struggle with it in open-web question answering. Existing prompting and supervised fine-tuning (SFT) methods remain fixed by prompt rules or training corpora, and are usually benchmarked only on well-structured wiki sources, limiting real-world adaptability. We introduce WebPuzzle, a 24k-sample training and 275-sample test benchmark that evaluates information seeking on the live internet, across both wiki and open-domain queries. Leveraging 7k WebPuzzle instances, we develop DeepDiver, a reinforcement-learning (RL) framework that cultivates Search Intensity Scaling (SIS)-an emergent ability to escalate search frequency and depth instead of settling on overconfident, under-evidenced answers. With SIS, Qwen2.5-7B-Instruct and Pangu-7B-Reasoner attain performance on real-web tasks comparable to the 671B-parameter DeepSeek-R1. We detail DeepDiver's curriculum from cold-start SFT to a well designed RL procedure, and show that its seeking policy generalized from closed-ended queries to open-ended generation such as long-form writing. Our results advance adaptive information seeking in LLMs and provide a rigorous benchmark for future work.
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- Europe > Hungary > Budapest > Budapest (0.04)
- Asia > Singapore > Central Region > Singapore (0.04)
- (2 more...)
- Research Report > Experimental Study (0.92)
- Research Report > New Finding (0.67)
- Information Technology (1.00)
- Media (0.67)
- Leisure & Entertainment > Sports > Motorsports > Formula One (0.45)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- (2 more...)